Back

Nucleic Acids Research

Oxford University Press (OUP)

Preprints posted in the last 7 days, ranked by how well they match Nucleic Acids Research's content profile, based on 1128 papers previously published here. The average preprint has a 0.81% match score for this journal, so anything above that is already an above-average fit.

1
Locally adaptive conformal prediction intervals for polygenic score-based phenotype prediction via residual normalization and data-driven stratification

Yun, Y.; Hao, X.; Zhang, Y. D.

2026-05-30 genetic and genomic medicine 10.64898/2026.05.28.26354326 medRxiv
Top 8%
2.5%
Show abstract

Quantifying uncertainty in polygenic score (PGS)-based phenotype prediction is crucial for the integration of genomic data into precision medicine. While the PGS provides a fundamental pivot for point estimation, clinical decision-making necessitates the construction of well-calibrated prediction intervals that reliably encompass the true phenotypic values. However, phenotypic residuals are frequently characterized by complex heteroscedasticity and stratified variance structures across diverse demographic contexts. Existing approaches often rely on global calibration mechanisms, which fail to account for such localized variance structures and lead to systematic miscalibration within specific subpopulations. To bridge this gap, we propose Clustering-based Split Conformal Prediction with Normalized Residuals (C-SCNR), a versatile framework based on Split Conformal Prediction. By adopting residual normalization and incorporating a repetitive `split-and-cluster` mechanism, C-SCNR dynamically identifies latent error strata and applies fine-grained adjustments to the resulting intervals. Our framework requires no distributional assumptions regarding the phenotype, is compatible with any PGS method, and flexibly accommodates biologically-informed grouping. Simulation studies demonstrate that our framework consistently outperforms existing methods across diverse error distributions. In real-data applications analyzing Body mass index (BMI), Low-density lipoprotein (LDL) cholesterol, and High-density lipoprotein (HDL) cholesterol in the UK Biobank, C-SCNR effectively resolves the coverage deficiencies of existing methods in specific subgroups and consistently yields superior localized calibration. Overall, C-SCNR represents a flexible and powerful framework for constructing high-resolution context-specific prediction intervals, thereby facilitating more reliable clinical interpretations of polygenic risk.

2
Normative Speech Modeling for ALS Diagnosis with Application to Other Neurodegenerative Diseases

Shah, M.

2026-05-27 neurology 10.64898/2026.05.25.26354057 medRxiv
Top 11%
1.7%
Show abstract

Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease affecting more than 450,000 individuals worldwide and is frequently diagnosed more than 12 months after symptom onset, delaying intervention during a critical early window. Because up to 80% of patients develop dysarthria within two years, subtle changes in speech provide a signal of early bulbar motor neuron degeneration. However, existing speech-based systems rely on supervised classification trained on limited datasets, achieving moderate sensitivity and depending heavily on labeled disease examples, which restrict scalability and early detection. This study introduces SPEAK-NORM, the first-ever normative speech modeling framework for early ALS diagnosis, which learns age- and sex-conditioned motor-speech distributions exclusively from healthy individuals. A conditional variational autoencoder models coordination of hypoglossal, laryngeal, and respiratory motor pathways, and deviation from this healthy manifold is quantified through latent representations and reconstruction error to form a 354-dimensional profile. A calibrated linear Support Vector Machine performs subject-level classification under subject-disjoint validation. On the VOC-ALS database (n = 153), SPEAK-NORM achieves 98% accuracy with balanced sensitivity and specificity, significantly outperforming established clinical acoustic indices and prior systems. The framework maintains strong performance under cross-task generalization and when retrained on healthy controls in independent dementia and Parkinson disease cohorts, demonstrating disease-specific deviation patterns rather than generic neurodegenerative change. Spectral, temporal, and latent separations further support interpretability. By modeling healthy speech instead of memorizing disease examples, SPEAK-NORM enables scalable early neuromotor screening using recording devices, with potential to support earlier diagnosis, differential classification, and monitoring of ALS progression.

3
A TAD-informed aging-brain xQTL atlas of multi-modal and cell-type-resolved regulatory variation

Cifello, J.; Feng, R.; Grenn, F. P.; Carter, L.; Liu, A.; Sun, H.; Li, R.; Empawi, J. A.; Greenfest-Allen, E.; Katanic, Z.; Valladares, O.; Kuzma, A. B.; White, H.; Farrer, L. A.; Goate, A. M.; Raj, T.; Wang, M.; Cruchaga, C.; Wang, L.-S.; Klein, H.; De Jager, P. L.; Chen, H.; Marcora, E.; TCW, J.; Zhang, X.; Kuksa, P. P.; Wang, G.; Leung, Y. Y.

2026-06-01 genetic and genomic medicine 10.64898/2026.05.21.26353713 medRxiv
Top 12%
1.4%
Show abstract

Understanding the regulatory consequences of genetic variation in the aging human brain requires molecular maps that span brain regions, cell types and regulatory modalities. We present the Alzheimer's Disease Sequencing Project Functional Genomics (FunGen-AD) xQTL Atlas, a harmonized resource of molecular quantitative trait loci from four postmortem brain studies, ROSMAP, MSBB, Knight-ADRC and MiGA. The atlas integrates histone acetylation, DNA methylation, gene expression, splicing and protein abundance QTLs across 14 brain regions, 7 major cell types and 17,566 samples, with standardized association, significance-filtered and fine-mapping outputs. To expand discovery beyond conventional 1-Mb cis windows, we include variants within Topologically Associating Domains (TAD) and their boundaries where appropriate, identifying on average 21% more variant-molecular-trait associations per dataset. Statistical fine-mapping reduced broad association sets by 95% into credible sets of candidate regulatory variants. Distributed through the NIAGADS xQTL portal and bulk-download services, the atlas provides a comprehensive functional-genomic foundation for interpreting genetic risk variants in Alzheimer's disease and aging-brain research.

4
Beyond Identifier Matching: An Empirical Characterization of Failure Modes in Biomedical Knowledge Graph Integration

Hu, S.; Cheng, H.; Gillenwater, L.; Manpearl, K.; Mandava, A.; Wang, Y.; Pividori, M.; Stranger, B.; Krishnan, A.; Greene, C.; Gao, Y.

2026-05-28 health informatics 10.64898/2026.05.26.26354182 medRxiv
Top 13%
1.2%
Show abstract

Objective. Biomedical knowledge graphs (KGs) such as PrimeKG, Hetionet, UMLS, and PharmGKB are increasingly used as the substrate for downstream machine-learning, retrieval-augmented generation, drug-repurposing, and electronic health record (EHR) augmentation pipelines. The dominant assumption in published work is that integrating two or more such KGs is a tractable engineering step solved by identifier (ID) matching. This paper interrogates that assumption empirically. We quantify how much concept overlap survives realistic alignment, and we characterize the new failure modes introduced by the methods that practitioners reach for when ID matching is insufficient. Materials and Methods. We compared four widely used biomedical KGs (PrimeKG, Hetionet v1.0, the full UMLS Metathesaurus, and PharmGKB) across eleven node types using a tiered alignment pipeline: (1) direct ID matching for nodes sharing a primary vocabulary; (2) cross-ontology bridging using standard mappings (e.g., MONDO-DOID, HPO-UMLS, HPO-UMLS-MeSH for side effects, NCBI Gene-HGNC-UMLS, UBERON-FMA/SNOMEDCT_US/NCI/MeSH for anatomy); (3) ClinicalBERT cosine-similarity grouping at threshold >= 0.98 for over-segmented disease nodes, with a deterministic suffix-stripping canonicalizer; (4) exact name matching for ontology-poor types (anatomy, REACTOME pathways); and (5) embedding-based fuzzy matching with UMLS lookup (SapBERT and ClinicalBERT) for free-text microbiome concepts. We applied the pipeline to a 698-concept gut-microbiome benchmark spanning taxa, pathways, and disease labels, validated grouping decisions against the curated SSSOM mappings released by the MONDO project, and audited the ClinicalBERT consolidation against five clinical-genetics case studies drawn from the literature. Results. Per-type pairwise coverage was strikingly asymmetric. Genes/proteins and the three Gene Ontology categories aligned cleanly across PrimeKG and Hetionet (mutual coverage 94-99%), but disease overlap was sparse: only 0.7% of PrimeKG individual disease nodes mapped to Hetionet, rising to 2.0% after MONDO grouping (versus 78.7% and 18.4% from the Hetionet side). PrimeKG-to-UMLS coverage spanned 100% (effect/phenotype via HPO) down to 20.8% (REACTOME pathways), with drugs at 73.7% and anatomy at 58.8%. PrimeKG-to-PharmGKB drug coverage required up to two bridging hops (DrugBank -> UMLS -> RxNorm/ATC/MeSH). Bigger was not uniformly more complete: on a 698-concept microbiome drug benchmark, Hetionet missed 0 concepts while PrimeKG missed 16. ClinicalBERT-based grouping consolidated 22,205 raw MONDO disease nodes into 17,080 groups but introduced three reproducible failure modes documented in case studies: (i) peer over-merging: for example, all 22 osteogenesis imperfecta subtypes collapsed into a single node despite distinct severity classes; (ii) parent-child collapse: e.g. acute myeloid leukemia merged with myeloid leukemia, erasing the acute/chronic distinction that drives clinical management; and (iii) lexical false positives: neurofibromatosis and schwannomatosis grouped together despite cellular-pathology differences. Discussion. Identifier matching alone is a weak baseline for biomedical KG integration. Cross-ontology bridges and embedding-based consolidation expand coverage but do so at the cost of clinically meaningful resolution, and the resulting failures are systematic rather than random. Reporting only aggregate coverage statistics obscures these losses, which propagate silently into downstream tasks. Conclusion. We provide reusable per-type coverage tables, a taxonomy of three integration failure modes, and concrete recommendations for downstream studies that depend on a unified biomedical KG. We argue that future KG integration work should report per-type coverage and per-cluster confidence rather than aggregate match rates.

5
In vitro splice-switching oligonucleotide rescues aberrant GFM2 pseudoexon inclusion and restores mitochondrial activity

Gross, S.; Birnbaum, R.; Shaul Lotan, N.; Mor-Shaked, H.; Manor, J.; Shaag, A.; Rosenbluh, C.; Levy-Memo, A.; Yanovsky-Dagan, S.; Saada, A.; Harel, T.

2026-06-01 genetic and genomic medicine 10.64898/2026.05.28.26354078 medRxiv
Top 14%
1.2%
Show abstract

Background: Biallelic variants in GFM2, encoding mitochondrial elongation factor G2 (mtEFG2), a GTPase involved in the termination stage of mitochondrial translation, cause autosomal recessive combined oxidative phosphorylation deficiency. Noncoding structural variants may be missed by exome sequencing but can disrupt splicing and provide opportunities for variant-specific therapeutic rescue. We investigated the molecular mechanism underlying suspected Leigh syndrome in an infant with mitochondrial disease and evaluated whether splice-switching oligonucleotide (SSO) treatment could correct the pathogenic splicing defect. Methods: The proband underwent exome sequencing followed by short-read and long-read whole genome sequencing. RNA sequencing, reverse-transcription PCR, quantitative PCR, and cycloheximide treatment were used to characterize the effect of the identified intronic duplication on GFM2 splicing and transcript stability. Patient-derived fibroblasts were treated with SSOs targeting the aberrant splice junction. Rescue was assessed by RNA studies, western blotting, and spectrophotometric measurement of cytochrome c oxidase (COX). Results: Whole genome sequencing identified a paternally-inherited GFM2 missense variant, NM_032380.5:c.2195C>T p.(Pro732Leu), in trans to a maternally-inherited 221-nucleotide intronic duplication, NM_032380.5:c.2029-741_2029-521dup. RNA studies revealed a 87-nucleotide pseudoexon, generated by activation of a cryptic acceptor splice site within the duplicated sequence. The resulting transcript harbored a premature termination codon (PTC) and underwent nonsense-mediated decay, as confirmed by cycloheximide rescue. Together with reduced mtEFG2 protein levels on western blot, the findings supported a loss-of-function mechanism. Enzymatic analysis of affected fibroblasts showed reduced activity of the mtDNA-dependent complex IV subunit COX, with preservation of the nuclear-encoded complex II enzyme succinate dehydrogenase and the control enzyme citrate synthase, consistent with impaired mitochondrial translation. A SSO targeting the aberrant intron-pseudoexon junction nearly abolished pseudoexon inclusion, restored correctly spliced GFM2 transcript from the duplication-containing allele, increased mtEFG2 protein levels, and significantly improved COX activity. Conclusions: This study identifies a pathogenic intronic GFM2 duplication that causes mitochondrial disease through pseudoexon activation and nonsense-mediated decay. The findings demonstrate the value of integrated genome and transcriptome analysis for exome-negative mitochondrial disease and provide in-vitro proof of concept that SSOs can restore transcript processing, protein expression, and mitochondrial respiratory-chain function in patient-derived cells.

6
High-dimensional Characterization of Genome-Environment Fitness Landscapes in Klebsiella pneumoniae

Zhou, G.; Williams, G.; Millner, M. T.; AlHirayban, R.; Alosaimi, W.; Fallatah, O.; Hart, A. J.; Malaikah, M.; Iftikhar, S.; Ahmad, H.; Roghanian, M.; Mustonen, V.; AlYami, R.; Banzhaf, M.; Moradigaravand, D.

2026-05-30 genetic and genomic medicine 10.64898/2026.05.28.26354339 medRxiv
Top 16%
0.9%
Show abstract

Background Bacterial fitness is shaped by interactions between genome variation and environmental context, yet how these interactions determine its predictability and heritability remains unclear. In the clinically important pathogens of Klebsiella pneumoniae, a leading cause of hospital-acquired infections, this question is particularly pressing. Despite extensive genomic characterization, we still lack a systematic understanding of how genome-wide variation translates into fitness across diverse environments in K. pneumoniae. Methods We filled this gap by profiling a systematic collection of 1,462 clinical K. pneumoniae isolates across 214 diverse environmental and pharmacological stress conditions using high-throughput chemical genomics. Fitness was quantified from colony growth and integrated with whole-genome sequencing data. Genome-wide association analyses identified genetic determinants of fitness, and machine learning models incorporating genomic features were used to predict fitness.Results Fitness exhibited a strongly environment-dependent genetic architecture, with modest but significant concordance between genetic background and phenotypic variation. Under antibiotic and stress-combination conditions, fitness was driven by discrete, high-effect determinants, including known resistance genes, resulting in stronger signals and improved predictability. In contrast, non-antibiotic environments showed more polygenic and distributed architectures with weaker associations. Genome-wide analyses identified both established and previously uncharacterized genes linked with fitness across conditions. Resistance and virulence determinants exhibited clear context-dependent trade-offs, conferring fitness advantages under selection but imposing costs in non-selective environments. Consistent with this, plasmid carriage showed environment- and genotype-dependent fitness effects, with benefits under antibiotic pressure and measurable costs otherwise. Genomic variant-based models for fitness prediction achieved moderate performance (Mean Spearman correlation ({rho}) = 0.36 (95% CI: 0.18-0.67) for predicted versus observed values in unseen data) across conditions, with improved accuracy under strong antibiotic selective pressures, and produced well-calibrated prediction intervals with high coverage. Despite strong population structure effect on predictions, models captured predictive gene and SNP biomarkers for fitness. Conclusion These findings highlight that bacterial fitness is an emergent property of genome-environment interactions rather than a fixed attribute of genotype. This work establishes a unified high-dimensional genotype-phenotype framework linking genomic variation to fitness across diverse conditions in a major pathogen, with broader implications for other pathogenic bacterial species.

7
Translational bioinformatics and machine learning framework for biomarker discovery, disease prediction, and patient profiling for precision medicine

Ahmed, Z.; Govindareddy, P.; DeGroat, W.; Narayanan, R.; Peker, E.; Zeeshan, S.

2026-05-27 genetic and genomic medicine 10.64898/2026.05.23.26353961 medRxiv
Top 17%
0.8%
Show abstract

Precision medicine aims to advance our ability from a "one-size-fits-all" approach to personalized and predictive healthcare across diverse populations. It promotes integration of multi-omics and phenotypic data to understand disease mechanisms and discover novel biomarkers and risk factors, which could be used to predict and prevent critical diseases in individual patients across diverse populations. The potential implications of precision medicine approach can accelerate our ability to classify patients at higher risk of developing critical diseases, improve diagnostic capabilities, develop deeper understanding of individual risk, investigate racial differences and demographic characteristics, and find relationships between genetic variants, expressions, and diseases. This study focuses on implementing an innovative and data driven framework of translational bioinformatics and Machine Learning (ML) techniques to analyze multi-omics, including RNA-seq and Whole-Genome Sequencing (WGS) data, generated using blood samples of randomly consented patients. First, we utilized bioinformatics pipelines to identify differentially expressed genes and their pathogenic and likely pathogenic variants for the downstream data analysis, annotation, and visualization. Then, applied a nexus of ML models for multi-omics biomarker discovery, disease prediction, density-based clustering, single-patient profiling, and pathogenicity classification. WGS data analysis supported the exploration of genetic variation and diversity among patients to identify known and novel biomarkers, whereas RNA-seq data analysis improved our understanding of functional and biological pathways that underlying disease states. We classified and clustered pathogenic variants and expressions across various genes and discovered numerous diseases leading risk factors. Our results include gene-disease associations and captured common pathways across the broader population, demonstrating a level of sensitivity and accuracy that has broad clinical implications. We validated our results through clinical records, and state of the science literature. This study delves into the strengths of multi-omics data integration and capabilities of ML application in genetically diverse and complex patient cohorts. Our approach has the potential to elucidate complex gene-disease interactions for genetically diverse populations, which can support earlier diagnoses for patients in many disease realms.

8
Measuring the Meaning of Genomic Results: Harmonization of the Metric for Case-Level Results in the CSER2 Consortium

Powell, B. C.; Amendola, L. M.; Bonini, K. E.; Crosslin, D.; Desrosiers-Battu, L.; Hiatt, S. M.; Hindorff, L.; Kenny, E. E.; Mavura, Y.; Muenzen Ferar, K. D.; Risch, N.; Roman, T.; Slavotinek, A.; Van Ziffle, J.; Bowling, K. M.

2026-06-01 genetic and genomic medicine 10.64898/2026.05.28.26354388 medRxiv
Top 18%
0.7%
Show abstract

Yield of reported results from genetic testing provides a proximal measure of clinical usefulness. While ACMG/AMP guidelines provide representations of uncertainty for individual genetic variant classification, additional factors are considered when determining whether results explain a patient's presentation. To standardize cross-consortium analysis, a working group of the Clinical Sequencing Evidence-Generating Research (CSER2) consortium iteratively identified factors used when contextualizing variant-level results to case-level interpretation (i.e., interpretation of an individual's genetic data with respect to the indication for testing). Sites independently categorized results; complex cases were discussed collaboratively, leading to revision of classification categories. Our metric incorporates factors beyond classification of reported variants. Analogous to variant-level results, "Definitive Positive" and "Probable Positive" represent certainty that results may be clinically explanatory. The category "Inconclusive" applies when results may or may not fully explain the patient presentation, with subdivision into multiple (non-exclusive) subcategories. Cases falling outside all of the other categories are considered "Negative". The overall diagnostic yield by this metric and use of categories for inconclusive results varied by CSER project, in part paralleling study design differences. This case-level categorization provides a meaningful assessment of diagnostic yield, and for inconclusive cases identifies potentially resolvable factors for case resolution.

9
The Impact of Non-coding G-quadruplex Variants on Human Traits and Disease Susceptibility

Sharma, R.; Hu, F.; Li, X.; Campos, R.; Kundu, K.; Atanur, S.; Karpinski, M.; Wasilewski, S.; MacArthur, S.; Vitsios, D.; Dhindsa, R. S.; Georgakopoulos-Soares, I.; Burren, O. S.; Petrovski, S.; Mustoe, A. M.; Wang, Q.; Glodzik, D.; Zou, X. Z.

2026-06-01 genetic and genomic medicine 10.64898/2026.05.29.26354456 medRxiv
Top 20%
0.7%
Show abstract

Non-coding variants are important contributors to human traits and diseases but linking them to molecular mechanisms and phenotypes at scale remains challenging. G-quadruplexes (G4s) are four-stranded structures formed by guanine-rich sequences and have emerged as key functional elements within the non-coding genome. G4s are enriched in regulatory regions and can modulate gene expression at both the DNA and RNA levels, influencing transcription, replication, and RNA processing, positioning them as key mediators linking non-coding variation to complex biological traits. Here, we profile putative G4s across five regulatory regions in 459,449 UK Biobank genomes and perform phenome-wide association analyses spanning 2,941 plasma protein abundances, 13,321 binary traits, and 1,682 quantitative traits. We show that putative G4-modifying variants are depleted under purifying selection despite elevated local mutability and drive large, bidirectional associations with plasma proteins and clinical traits, including associations not captured by coding variants. Using a mechanism-aware collapsing strategy that groups rare non-coding variants by their predicted impact on G4 stability, we achieved stronger gene-level signals than those obtained with standard rare-variant collapsing approaches. Integrating non-coding and protein-truncating variants (PTVs) increases discovery power, revealing 843 significant associations missed by the PTV-only model. Replication in the Alliance for Genomic Discovery cohort demonstrates cross-cohort robustness. Our study suggests G4s as widespread mediators of non-coding regulation and provides a framework for mechanism-informed target discovery and prioritization across the non-coding genome.

10
Integrative Genetic Analyses of Lipid Metabolism and Multiple Sclerosis Severity Using Metabolome-Wide and Cis-Mendelian Randomization

Noroozi, R.; Higgins Tejera, C.; Chen, M.; Briggs, F. B. S.; Bhargava, P.; Fitzgerald, K. C.

2026-05-29 neurology 10.64898/2026.05.27.26354239 medRxiv
Top 24%
0.4%
Show abstract

The course of multiple sclerosis (MS) is highly heterogeneous, yet the biological mechanisms underlying this variability remain incompletely understood. Although metabolic alterations have increasingly been associated with disease progression, existing observational evidence is limited by confounding, reverse causation, and an inability to establish causal mechanisms. To bridge this gap, we used a metabolome-wide Mendelian Randomization (MR) framework, including thorough sensitivity analyses, to identify metabolites genetically linked to MS severity that can causally affect it. Bidirectional MR analyses revealed a subset of amino acid and lipid pathways with strong, consistent effects across different MR approaches, confirmed by tests for heterogeneity, horizontal pleiotropy, and LD confounding. For metabolites prioritized by metabolome-wide MR with evidence of causal effects, we conducted genetic colocalization at loci encompassing proximal enzyme-encoding genes, leveraging the corresponding instrumental variants to assess shared underlying genetic signals. This process revealed shared genetic signals between metabolite levels and MS severity, mapped to the FADS1/2 and CYP4F2 loci. A subsequent pathway-resolved set of cis-MR analyses across FADS1/2-derived polyunsaturated fatty acid (PUFA) metabolites, using a functional variant that proxies reduced {triangleup}5-desaturase activity, showed consistent effects indicating that FADS1 perturbation is associated with MS severity. Collectively, these results highlight FADS1 as a key driver of PUFA-related causal effects on MS severity in both systemic (circulating metabolites) and brain cell-specific contexts. Additional supportive cis-MR evidence implicates the disruption of CYP4F2 as another PUFA-metabolizing enzyme.

11
TopBrain Segmentation Challenge for Whole Brain Vessel Anatomy

Yang, K.; Shi, P.; Huang, H.; Musio, F.; Baazaoui, H.; Aydin, O. U.; Hilbert, A.; Hamadache, R. E.; Yalcin, C.; Zhang, M.; Falcetta, D.; de la Rosa, E.; Shit, S.; Prabhakar, C.; Wittmann, B.; Rokuss, M. R.; Kirchhoff, Y.; Al-Maskari, R.; Hoeher, L.; Juchler, N.; Casamitjana, A.; Cleary, J.; Schmick, A.; Baumgartner, P.; Deseoe, J.; Vandans, O.; Lee, D.; Oh, K.; LaBella, D.; Mazher, M.; Niederer, S. A.; Qayyum, A.; Liu, Y.; Chen, J.; Kim, W.; Asawalertsak, N.; Kim, M.; Shin, D.; Park, S.-H.; Kikuchi, S.; Zhang, Y.; Liu, J.; Cui, Y.; Qiu, Y.; Verschuur, A.; Zhang, J.; van der Schaaf, I.; Su, R.;

2026-05-30 radiology and imaging 10.64898/2026.05.28.26354312 medRxiv
Top 27%
0.3%
Show abstract

We present the TopBrain 2025 Challenge, the first benchmark for fine-grained multiclass segmentation of the whole brain vasculature in both computed tomography angiography (CTA) and magnetic resonance angiography (MRA). Building on the TopCoW challenge, TopBrain scales vessel annotation from the Circle of Willis to the entire brain, introducing a dataset of 90 annotated volumes across 48 landmark vessel classes spanning arterial and venous systems, of which 50 training volumes are publicly released. Vessel definitions were consolidated from established neuroanatomical references into a unified annotation scheme, and vessel caliber measurements along the centerline are reported for the first time across the whole brain vascular anatomy. To address the unique challenges of multiclass brain vessel segmentation, we propose an evaluation framework that accounts for detection in segmentation performance, assesses anatomical plausibility, and introduces novel contamination metrics that characterize inter-class prediction errors. Fifteen teams from over 220 registered participants submitted algorithms to the benchmark. The top-performing teams built on nnUNet with principled system design choices, achieving around 80% Dice scores, near-zero invalid neighbor counts, over 60% F1 scores for side-road vessels, and below 18% foreground contamination ratio. Larger vessels are easier to segment, while smaller and more complex vessels remain the true bottleneck. The annotated datasets and podium-finish algorithms are made publicly available on Zenodo.

12
Multimodal axes reveal individualized amyloid-β , tau, and neurodegeneration coupling in aging and Alzheimer s disease

Poulakis, K.; Ioannou, K.; Bezgin, G.; Chiotis, K.; Iturria-Medina, Y.

2026-05-26 neurology 10.64898/2026.05.24.26353955 medRxiv
Top 29%
0.3%
Show abstract

Can we decode Alzheimers disease (AD) heterogeneity into a few portable axes that capture how amyloid-{beta}, tau and neurodegeneration (A-T-N) spatially co vary in vivo? To answer this question, we built a pipeline that harmonizes longitudinal amyloid-{beta}/tau PET and T1 MRI (gray matter) from ADNI cohort (12,430 images) with mixed effects modeling and then derived stage specific multimodal axes (mVCs) using linked component analysis, with robustness tested in simulations and external validation in the OASIS cohort (4,958 images). We identified a small set of multimodal axes that (i) recapitulate early tau weighted variation in cognitively unimpaired (CU) individuals, AD like A-T-N coupling in cognitively impaired (CI) individuals and atypical CU and CI participants with posterior (precuneus/occipitoparietal) and fronto insular/frontal weighted patterns, (ii) map onto domain specific cognition, APOE e4, and blood/CSF biomarkers of neurodegeneration, neuroaxonal injury and astrocyte activation, (iii) predict clinical transitions, (iv) generalize in an independent cohort, and (v) demonstrate modelling robustness to missing data, high dimensionality, and cross-cohort variability, enabling direct application of the extracted axes to new datasets for biomarker discovery and stratification. Multimodal axes provide a portable, interpretable layer for quantifying amyloid-{beta}-tau-neurodegeneration coupling at the individual level, complementing current biomarker-based staging frameworks based on A-T-N status and tau PET topography, and can be computed on new datasets to aid clinical assessment and trial enrichment.

13
Advanced Multimodal AI for Predicting Long-Term Functional Outcomes After Ischemic Stroke Using Only Admission Data

McBride, F.; Huang, H.; Kapoor, A. K.; Oermann, E.; Frontera, J. A.; Razavian, N.

2026-05-29 neurology 10.64898/2026.05.27.26354289 medRxiv
Top 29%
0.3%
Show abstract

Background and Purpose Prognostication after acute ischemic stroke often relies on limited variables and simple risk scores, despite richer information being available at admission. We developed a multimodal AI model using admission data to predict modified Rankin Scale (mRS) outcomes and compared it to established tools. Methods In a retrospective study of ischemic stroke/TIA patients, we trained three modality-specific models on admission non-contrast head CT, history and physical notes, and structured clinical variables, and combined them in a weighted-average ensemble. We predicted binary (mRS 0-2 versus 3-6) and ordinal mRS (0-6) outcomes at discharge and 90 days. Performance on an external test cohort was compared with THRIVE and SPAN-100 scores using AUROC, AUPRC, Brier score, mean absolute error (MAE), and quadratic weighted kappa (QWK). Results A total of 6,915 patients were split into training, validation and testing cohorts in a 3:1:1 ratio. For discharge binary mRS (n=1596), the multimodal ensemble achieved significantly better discrimination (AUROC 0.859, AUPRC 0.858) with 25-61% lower Brier scores than THRIVE or SPAN?100 (all p<0.001). For 90?day binary mRS (n=207), the model also outperformed both THRIVE and SPAN-100 (AUROC 0.838, AUPRC 0.805, with 3-38% lower Brier scores). Ordinal mRS prediction showed similarly strong performance with significantly better QWK at discharge and numerically lower MAE. The multimodal ensemble model reassigned about one?third of patients to different risk categories versus THRIVE and was closer to the true discharge outcome in ~74% of discordant cases. Conclusions We developed a well-calibrated multimodal AI model for prediction of discharge and 90-day post-stroke functional outcomes using only data present at the time of admission. This model outperforms existing prognostic tools and can support early clinical decision-making.

14
DISCERN: A Clinical Impact-aware Framework for Radiology Report Comparison

Sharma, R.; Beeche, C.; Dong, J.; Zhuang, R.; Qu, H.; Zhang, R.; Gangaram, V.; Goswami, P.; Xin, J.; Ballard, J.; Goldberg, A.; Sagreiya, H.; Long, Q.; Chen, T.; Witschey, W. R.

2026-05-27 radiology and imaging 10.64898/2026.05.26.26353612 medRxiv
Top 30%
0.3%
Show abstract

The surge in medical imaging has spurred the development of vision-language models (VLMs) to alleviate radiologist workloads. However, clinical deployment is hindered by the lack of meaningful evaluation frameworks. Current metrics - ranging from semantic similarity to large language model (LLM) based judges - often fail to distinguish between clinically trivial and critical discrepancies, poorly reflecting real-world clinical judgment. To address this, we introduce DISCERN (Discordance and Significance-aware Entity-level Radiology Report Comparison). DISCERN is a significance-aware framework that weighs report errors based on their potential impact on patient care. Our results demonstrate that DISCERN powered by closed source LLMs aligns more closely with expert radiologist assessments than traditional metrics or current LLM evaluators, providing a more interpretable and clinically relevant benchmark. By modeling radiologist prioritization and entity-level feedback, DISCERN facilitates targeted model refinement and ensures the safer integration of generative AI into clinical workflows.

15
Local ancestry-aware genome-wide meta-analysis uncovers novel genetic loci for sickle cell disease nephropathy

Garrett, M. E.; Nouraie, S. M.; Machado, R. F.; Gordeuk, V. R.; Gladwin, M. T.; NHLBI Trans-Omics for Precision Medicine Consortium, ; Telen, M. J.; Ashley-Koch, A. E.

2026-05-30 genetic and genomic medicine 10.64898/2026.05.27.26354213 medRxiv
Top 32%
0.2%
Show abstract

In the United States, sickle cell disease (SCD) is a rare inherited hemoglobinopathy affecting about 100,000 individuals, mostly with African ancestry. SCD causes damage to multiple organ systems and SCD nephropathy (SCDN) is a common complication associated with early mortality. We previously performed a genome-wide association study (GWAS) for SCDN and identified a modest number of genome-wide significant loci. Here, we leveraged the ancestral composition of participants from two well-characterized adult SCD cohorts to boost statistical power and perform a local ancestry-aware GWAS for estimated glomerular filtration rate (eGFR), resulting in the identification of novel genome-wide significant loci within the African (AFR) and European (EUR) ancestral components of participants. Meta-analysis identified 12 significant genomic regions in the AFR tract, including PPIL6, ARHGAP24, RAB11A, and STEAP3, and 38 regions in the EUR tract, including UBLCP1, ADAMTS6, JAZF1, MYO7B, MYO1C, PDGFA, GPC5, LRP1B, KANK1, and TRPV5. The identified regions encompass genes affecting inflammation, extracellular matrix (ECM) integrity, iron metabolism, magnesium ion homeostasis, B cell apoptosis, tumor necrosis factor (TNF) production, and estrogen signaling. Many of these genes and pathways are important not only for renal function, but also for SCD biology, providing additional support for the hypothesis that SCDN pathophysiology is unique from other forms of kidney disease. This study represents the largest local ancestry-aware analysis of SCDN to date, furthers our understanding of the genetic risk factors underlying SCDN, and proposes new targets that could be useful for the early identification and treatment of kidney dysfunction in SCD patients.

16
HIV Transmission Dynamics in Greater Mexico City are Shaped by Dense Spatial Mixing

Escalera, M.; Lopez Ortiz, E.; Garcia Morales, C.; Cruz-Bonilla, E.; Guerrero Flores, S.; Weaver, S.; Matias Florentino, M.; Tapia Trejo, D.; Davila Conn, V.; Roberto Cardenas Porras, ; Eduardo Zarza Sanchez, ; Silvia del Arenal Sanchez, ; Jorge A Gutierrez Soto, ; Karina Nava Memije, ; Jessica Monreal Flores, ; Alejandro Guzman, ; Rebecca E Garcia Mendiola, ; Patricia Iracheta, ; Veronica Ruiz Gonzalez, ; Veronica Quiroz Morales, ; Israel Macias Gonzalez, ; Manuel A Becerril Rodriguez, ; Raul A Cruz Flores, ; Andrea Gonzalez Rodriguez, ; Dulce M Lopez Sanchez, ; Miroslava Card

2026-05-27 hiv aids 10.64898/2026.05.26.26354122 medRxiv
Top 32%
0.2%
Show abstract

Understanding HIV transmission in densely populated urban settings is essential to mitigate ongoing epidemic spread. We present a comprehensive analysis of recent HIV transmission dynamics in Greater Mexico City, one of the worlds largest metropolitan areas comprising Mexico City and neighbouring municipalities of the State of Mexico. Drawing from over 7,000 complete pol gene sequences representing around 50% of new cases reported between 2019 and 2022 within the study region, we reconstructed the transmission network based on pairwise genetic distance. We identified ten large transmission clusters exhibiting sustained growth up to the most recent sampling period. We further analysed paired genetic and high- resolution human mobility data using an integrated phylogeographic approach. We observed a heterogeneous pattern of viral spread across the region, supported by an extensive mixing at a wider geographic scale. Across Greater Mexico City, displaying a high population density, HIV transmission is minimally spatially constrained, a pattern likely fuelled by intense human mobility. Thus, population movement weakens isolation by distance in large urban areas even for a chronic infection that is sexually and vertically transmitted. We demonstrate the value of integrating large-scale genetic, epidemiological, and mobility data to resolve contemporary HIV transmission dynamics in densely populated urban settings

17
The dangers of data double dipping in assessing the classification accuracies of blood biomarkers in Alzheimer's disease and related disorder research

Liu, T.; Zeng, X.; Snitz, B. E.; Karikari, T. K.; Deek, R. A.

2026-06-01 neurology 10.64898/2026.05.22.26353848 medRxiv
Top 33%
0.2%
Show abstract

Blood biomarker models are increasingly used in Alzheimer's disease and related dementia translational research, but predictive performance can be inflated when the same dataset is used for both model development and evaluation. We assess the effect of data double dipping using simulations and NULISA proteomic data from the MYHAT-NI community-based cohort to predict brain amyloid-beta neuroimaging status. In both settings, training AUC increased as more biomarkers were added, while testing AUC peaked earlier and then declined. These findings show that data double dipping can inflate model performance and highlight the need for external validation or internal validation with data partitioning.

18
Multimodal atlas of human atherosclerosis links granular vascular cell states to coronary artery disease risk

Mosquera, J. V.; Tang, I.; Murach, M.; Auguste, G.; Kodali, A.; Hart, P.; Shaw, D. M.; Li, M.; Turner, A. W.; Hodonsky, C. J.; Dworak, N. M.; de Oliveira, A. K.; Sol-Church, K.; Jhee, T.; van der Sijs, K. I. M.; Adkar, S. S.; Choi, R. B.; Vacante, F.; Wu, J. C.; Cheng, P.; Giannarelli, C.; Leeper, N. J.; Finn, A. V.; Bjorkegren, J. L. M.; Kovacic, J. C.; Yurdagul, A.; van der Laan, S. W.; Miller, C. L.

2026-05-26 cardiovascular medicine 10.64898/2026.05.24.26353986 medRxiv
Top 36%
0.2%
Show abstract

Advances in single-cell and spatial assays have revolutionized the scale and resolution of molecular tissue profiling. Here we present MetaPlaq, a multimodal atlas of human atherosclerotic arterial beds comprising over a million cells across single-cell transcriptomics, epigenomics and high-resolution spatial expression assays. We map granular cell states and disease-relevant transcriptional programs within the native tissue context of coronary arteries. Furthermore, we map cardiovascular GWAS signals to smooth muscle cells (SMCs) and endothelial cells (ECs) and uncover the cis-regulatory architecture governing their phenotypic transitions. Our comprehensive epigenomic reference allowed us to build cell-specific enhancer-gene link maps and multimodal gene regulatory networks (GRNs) underlying disease-relevant states such as osteogenic SMCs and ECs undergoing mesenchymal transition. We also integrate SMC and EC disease-associated gene sets with GRNs to nominate key transcription factors such as PRRX1, BNC2 and ELK3 regulating atherosclerosis-relevant transcriptional programs. Finally, we layer single-cell and spatial modalities to fine-map GWAS variants with improved cell and anatomical context. We highlight candidate cell-specific regulatory mechanisms at less characterized CAD loci, including FGD5 and MCF2L in ECs. Together, this atlas represents an important step towards fully interpreting genetic risk loci and informing new therapeutic strategies for cardiovascular disease.

19
Intravital mid-infrared biosensing by normalized spatial probing of self-referenced optothermal signals

Berger, C. G.; Puttfarcken, B.; Qiu, J.; Hauer, I.; Herr, S.; Juestel, D.; Pleitez, M. A.

2026-05-28 endocrinology 10.64898/2026.05.27.26354202 medRxiv
Top 36%
0.2%
Show abstract

We present a compact pump-and-probe mid-infrared Optothermal Spectrometer (OTHES) equipped with Spatial Probing and Autocorrection (SPAC) optimized for robust intravital application in humans. SPAC-OTHES facilitates alignment stability and spectral comparability across different measurement sessions involving different skin types. Contrary to state-of-the-art, SPAC-OTHES uses camera-based beam detection and an auto-calibration mechanism that enables ca. 73% better spectral reproducibility in intravital measurements in human volunteers than non-calibrated readouts. Moreover, SPAC-OTHES has the potential to lower the glucose quantification error, as demonstrated here in artificial skin phantoms, where an improvement of 52% compared to conventional diode-based detection was observed. The compactness of OTHES, combined with reliable SPAC-readout, has the potential to accelerate commercialization and broad application of biosensors based on mid-infrared spectroscopy.

20
Assessing Lipid Core Burden Index with Depolarization-Sensitive Optical Frequency Domain Imaging

Jones, G.; Otsuka, K.; Fujisawa, N.; Yamaura, H.; Matsumoto, K.; Okamoto, A.; Yamaguchi, T.; Shimada, T.; Kagawa, S.; Yamazaki, T.; Akasaka, T.; Bouma, B. E.; Villiger, M.; Fukuda, D.

2026-06-01 cardiovascular medicine 10.64898/2026.05.22.26353889 medRxiv
Top 36%
0.2%
Show abstract

Background: Quantitative lipid assessment is central to identifying rupture-prone coronary plaques and represents a therapeutic target for lipid-lowering therapy. Near-infrared spectroscopy (NIRS)-derived lipid core burden index (LCBI) is well validated and widely used for detecting lipid-rich lesions. Optical frequency domain imaging (OFDI) is increasingly adopted for guiding percutaneous coronary intervention (PCI) due to its high-resolution structural imaging capabilities. Depolarization-sensitive OFDI (depOFDI) provides intrinsic lipid contrast and may enable combined structural and compositional plaque characterization within a single OFDI-based platform. Objective: To define an OFDI-derived lipid metric and evaluate its agreement with NIRS-derived LCBI. Methods: Thirty-three patients underwent both polarization-sensitive OFDI and NIRS-intravascular ultrasound imaging during PCI. After exclusion of 4 datasets, 29 co-registered pullbacks were analyzed. A signal-to-noise-corrected depolarization metric was used to identify lipid-rich regions and generate depOFDI chemograms. maxLCBI4mm value and location, as well as total LCBI, were computed and compared with NIRS. Results: depOFDI demonstrated strong agreement with NIRS, showing high correlation for maxLCBI4mm (r^2 = 0.862) and total LCBI (r^2 = 0.867), along with strong spatial concordance for the location of the maxLCBI4mm (r^2 = 0.900). Bland-Altman analysis of LCBI4mm showed minimal bias (10.7) with 95% limits of agreement of [81.4 to 102.8]. Conclusions: depOFDI enables accurate quantification of lipid burden alongside the high-resolution structural information inherently provided by OFDI. Because depolarization metrics can be derived from polarization-diverse detection available in many commercial OFDI systems, this approach provides a practical pathway toward comprehensive plaque characterization within existing PCI workflows, without the need for additional imaging modalities.